Lecture 04

Author

Bill Perry

Lecture 3: Review

  • Introduction to histograms or frequency distributions
  • Probability Distribution Functions (PDF)
  • Descriptive Statistics
    • Center - mean, median, mode

    • Spread - range, variance, standard deviation

Our last graphs

Practice Exercise 1: Recreating Our Last Histograms

Let’s recreate the basic histogram of fish lengths from our last class. Use the sculpin_df data frame that’s already loaded.

# Write your code here to create a histogram of fish lengths from Toolik Lake
# Remember to use the pipe operator %>% and ggplot with geom_histogram()

Lecture 4: Lecture Overview

The objectives:

  • Introduction to hypothesis testing
  • The standard normal distribution
  • Standard error
  • Confidence intervals
  • Student’s t-distribution
  • H testing sequence
  • p-values

Lecture 4: Standard normal distribution

To understand hypothesis testing need to understand standard normal distribution

Recall - sculpin in Toolik Lake

  • n = 208

  • mean = 51.69 mm

  • std dev = s = 12.02 mm

  • Weight distribution ~normal

Practice Exercise 2: Compare Fish Distributions from Different Lakes

Let’s look at what lakes are in our dataframe:

# View the unique lake names
unique(sculpin_df$lake)
[1] "E 01"   "E 05"   "NE 12"  "NE 14"  "S 06"   "S 07"   "Toolik"

Now, select two lakes and create a comparison of their fish length distributions using facet_grid():

# Your code here to compare fish lengths between two lakes of your choice

Lecture 4: Standard normal distribution

You want to know things about this population like

  • probability of a baby born at the hospital having a certain length (e.g., > 60 mm)
  • Can solve this by integrating under curve
  • But it is tedious to do every time
  • Instead
    • we can use the standard normal distribution (SND)

Lecture 4: Standard normal distribution

Standard Normal Distribution

  • “benchmark” normal distribution with µ = 0, σ = 1
  • The Standard Normal Distribution is defined so that:
    • ~68% of the curve area within +/- 1 σ of the mean,

    • ~95% within +/- 2 σ of the mean,

    • ~99.7% within +/- 3 σ of the mean

*remember σ = standard deviation

Lecture 4: Standard normal distribution

Areas under curve of Standard Normal Distribution

  • Have been calculated for a range of sample sizes
  • Can be looked up in z-table
  • No need to integrate
  • Any normally distributed data can be standardized
    • transformed into the standard normal distribution
    • looked up ion a table

Lecture 4: Standard normal distribution

Done by converting original data points to z-scores

  • Z-scores calculated as:

\(\text{Z = }\frac{X_i-\mu}{\sigma}\)

  • z = z-score for observation
  • xi = original observation
  • µ = mean of data distribution
  • σ = SD of data distribution

Practice Exercise 3: Calculate Z-Scores

Let’s practice calculating z-scores for fish lengths. Calculate the z-scores for:

  1. A fish that is 25 mm long
  2. A fish that is at the mean length (approximately 51.7 mm)
  3. A fish that is 60 mm long
# Calculate z-scores for these three fish lengths
# Remember the formula: Z = (value - mean) / standard deviation

Now use your calculated z-scores to determine roughly where these fish fall in the overall distribution: - What percentage of fish are smaller than the 25 mm fish? - What percentage of fish are larger than the 60 mm fish?

# Use the pnorm() function to calculate these percentages

Lecture 4: Standard normal distribution

Thus:

  • z-score = value - mean/s
  • z-score of 25mm = (25 - 51.7) / 12 = -2.225
  • z-score of 51.7mm = (51.7 - 51.7) / 12 = 0
  • z-score of 60mm = (60 - 51.7) / 12 = 0.6916667

Lecture 4: Standard normal distribution

Area under curve (probability) of standard normal distribution is known relative to z-values

Knowing z-value, can figure out corresponding area under the curve

What is the area under curve < 0?

Lecture 4: Standard normal distribution

  • Here is z-score table for right side or positive values of the z distribution (z > 0)

  • Numbers give area under the curve to left of a particular z-score

  • say 60 mm as a z score of 0.6916667

Lecture 4: Standard normal distribution

Area under curve (probability) of standard normal distribution is known relative to z-values

Knowing z-value, can figure out corresponding area under the curve

What is the area under curve < 0?

  • 0.5 of the area of the curve is contained to the left of z = 0.00

Lecture 4: Standard normal distribution

area of the curve is contained to the left of z = 1.22

  • 0.8686 or 86.9%

Practice Exercise 4: Working with Probabilities

Using the z-table or the built-in R function pnorm(), answer these questions:

  1. What percentage of fish in Toolik Lake are expected to be longer than 65 mm?
# Convert 65 mm to a z-score
z_65mm <- (65 - toolik_result$mean) / toolik_result$sd

# Find the probability of a fish being larger than 65 mm
prob_larger_than_65mm <- 1 - pnorm(z_65mm)
prob_larger_than_65mm * 100  # Convert to percentage
[1] 13.4254
  1. Between what two lengths would you expect to find the middle 90% of fish in Toolik Lake?
# Find the z-scores for the 5th and 95th percentiles
z_5 <- qnorm(0.05)
z_95 <- qnorm(0.95)

# Convert these z-scores back to fish lengths
length_5 <- toolik_result$mean + z_5 * toolik_result$sd
length_95 <- toolik_result$mean + z_95 * toolik_result$sd

c(length_5, length_95)  # The middle 90% of fish lengths
[1] 31.91600 71.47343

Lecture 4: Standard normal distribution

What is the area of the curve is contained between of z = 0 and z=1.5?

Lecture 4: Standard normal distribution

What is the area of the curve is contained between of z = 0 and z=1.5?

  • approximately 0.4332 (or 43.32%)

To calculate this from a standard normal table:

To find the area under the standard normal curve between 0 and 1.5 using this table:

  • Locate z = 1.5 in the table - 0.9332.
    • represents P(Z ≤ 1.5) - probability Z is less than or equal to 1.5
  • Since need area between 0 and 1.5 - need to subtract P(Z ≤ 0) from P(Z ≤ 1.5)
  • From table - P(Z ≤ 0) = 0.5000.
  • Therefore, the area between 0 and 1.5 is: 0.9332 - 0.5000 = 0.4332.

Lecture 4: Standard normal distribution

What is the area of the curve is contained to the left of z = -1?

Lecture 4: Standard normal distribution

What is the area of the curve is contained to the left of z = -1?

  • Locate row for 1.0 - (table shows absolute value of z) and the column for .00
    • value = 0.8413 - represents P(Z ≤ 1.0)
  • However want P(Z ≤ -1.0)
    • need to use the symmetry property of the standard normal distribution:

    • P(Z ≤ -1.0) = 1 - P(Z ≤ 1.0) = 1 - 0.8413 = 0.1587

Therefore, 15.87% of area falls to the left of z = -1.0

Lecture 4: Standard error

Take random samples from fish population:

  • 3 random samples (each n=20) from population:

  • Notice the sample statistics and distributions

Practice Exercise 5: Sampling Distributions

Let’s explore how sample size affects our estimates by taking samples of different sizes:

# Set seed for reproducibility
set.seed(456)

# Create samples of different sizes
small_sample <- toolik_df %>% sample_n(10)
medium_sample <- toolik_df %>% sample_n(30)
large_sample <- toolik_df %>% sample_n(100)

# Calculate mean and standard error for each sample
small_mean <- mean(small_sample$total_length_mm, na.rm = TRUE)
small_se <- sd(small_sample$total_length_mm, na.rm = TRUE) / sqrt(10)

medium_mean <- mean(medium_sample$total_length_mm, na.rm = TRUE)
medium_se <- sd(medium_sample$total_length_mm, na.rm = TRUE) / sqrt(30)

large_mean <- mean(large_sample$total_length_mm, na.rm = TRUE)
large_se <- sd(large_sample$total_length_mm, na.rm = TRUE) / sqrt(100)

# Create a data frame with the results
results <- data.frame(
  Sample_Size = c(10, 30, 100),
  Mean = c(small_mean, medium_mean, large_mean),
  SE = c(small_se, medium_se, large_se)
)

# Display the results
results
  Sample_Size     Mean       SE
1          10 58.50000 2.260531
2          30 51.27083 1.656559
3         100 52.22973 1.062795

What do you observe about the standard error as sample size increases? Why does this happen?

Lecture 4: Standard error

Every sample gives slightly different estimate of µ

  • Can take many samples and calculate means
  • plot the frequency distribution of means
  • get the “sampling distribution of means”

Lecture 4: Standard error

3 important properties:

  • Sampling distribution of means (SDM) from normal population will be normal
  • Large Sampling distribution of means from any population will be normal (Central Limit Theorem)
  • The mean of Sampling distribution of means will equal µ or the mean

Lecture 4: Standard error

Given above

  • can estimate the standard deviation of sample means

  • “Standard error of sample mean”

  • How good is your estimate of population mean? (based on the sample collected)

  • quantifies how much the sample means are expected to vary from samples

  • gives an estimate of the error associated with using \(\bar{y}\) to estimate \(\mu\)

Lecture 4: Standard error

\(\sigma_{\bar{y}} = \frac{\sigma}{\sqrt{n}}\)

but rarely know σ, so use s \(s_{\bar{y}} = \frac{s}{\sqrt{n}}\) Where: \(s_{\bar{y}}\) = sample standard error of mean s = sample standard deviation n = sample size

Lecture 4: Standard error

Notice: - \(s_{\bar{y}}\) depends on - sample s (standard deviation) - sample n - (\(s_{\bar{y}} = \frac{s}{\sqrt{n}}\))

How and why? - Decreases with sample n - number - increases with sample s - standard deviation

  • Large sample, low s = greater confidence in estimate of \(\mu\)

Lecture 4: Confidence intervals

Every sample gives slightly different estimate of µ (population mean)

Want to know how accurate our estimate of µ is from a sample

Do this by calculating confidence interval:

  • Range of values that will contain the true population mean with a certain probability

Practice Exercise 6: Calculating Confidence Intervals

Let’s calculate 95% confidence intervals for the mean fish length in Toolik Lake:

# Calculate the standard error for Toolik Lake
toolik_se <- toolik_result$sd / sqrt(toolik_result$count)

# Calculate the 95% confidence interval using the normal distribution
# (since our sample size is large)
toolik_ci_lower <- toolik_result$mean - 1.96 * toolik_se
toolik_ci_upper <- toolik_result$mean + 1.96 * toolik_se

# Print the results
cat("Mean fish length in Toolik Lake:", round(toolik_result$mean, 1), "mm\n")
Mean fish length in Toolik Lake: 51.7 mm
cat("95% Confidence Interval:", round(toolik_ci_lower, 1), "to", round(toolik_ci_upper, 1), "mm\n")
95% Confidence Interval: 50.1 to 53.3 mm

Now choose another lake and calculate its 95% confidence interval:

# Choose another lake (e.g., "E 01")
my_lake <- "E 01"  # You can change this to any lake in the dataset

# Filter data for your chosen lake
my_lake_data <- sculpin_df %>% filter(lake == my_lake)

# Calculate mean, standard deviation, and standard error
my_lake_stats <- my_lake_data %>%
  summarize(
    mean = mean(total_length_mm, na.rm = TRUE),
    sd = sd(total_length_mm, na.rm = TRUE),
    n = sum(!is.na(total_length_mm)),
    se = sd / sqrt(n)
  )

# Calculate 95% confidence interval
my_lake_ci_lower <- my_lake_stats$mean - 1.96 * my_lake_stats$se
my_lake_ci_upper <- my_lake_stats$mean + 1.96 * my_lake_stats$se

# Print results
cat("Mean fish length in", my_lake, ":", round(my_lake_stats$mean, 1), "mm\n")
Mean fish length in E 01 : 58.2 mm
cat("95% Confidence Interval:", round(my_lake_ci_lower, 1), "to", round(my_lake_ci_upper, 1), "mm\n")
95% Confidence Interval: 54.8 to 61.6 mm

Do the confidence intervals for these two lakes overlap? What does this suggest about the difference between the fish populations?

Lecture 4: Confidence intervals

Often calculate 95% CIs

  • Interpret 95% CI to mean:
    • Range of values that contains µ (population mean) with 95% probability
  • More correctly:
    • If we took 100 samples from population
    • calculate a CI from each
    • 95 of the 100 CIs will contain the true population mean - µ

asdfasfasd

Lecture 4: Confidence intervals

Formula for confidence interval

\(\text{95% CI} = \bar{y} \pm z \cdot \frac{\sigma}{\sqrt{n}}\)

Where:

  • ȳ is the sample mean
  • 𝑛 is the sample size
  • σ is the population standard deviation
  • z is the z-value corresponding the probability of the CI

Lecture 4: Confidence intervals

Formula for confidence interval

\(\text{95% CI} = \bar{y} \pm z \cdot \frac{\sigma}{\sqrt{n}}\)

95% of probability of SND is bw z= -1.96 and z=1.96

So for:

  • 95% CI z = 1.960
  • 90% CI z = 1.645
  • 99% CI z = 2.576
  • And so on….

Lecture 4: Confidence intervals

In the more typical case DON’T know the population σ - estimate it from the sample s When don’t know the population σ - and when sample size is < ~30) - can’t use the standard normal (z) distribution

Instead, we use Student’s t distribution

Lecture 4: Student’s t-distribution

Student’s t distribution similar to SND

  • changes depending on degrees of freedom (df= n-1)
  • t distribution more “conservative”
    • smaller n is, the more conservative the t distribution is

At df = ~30 - t distribution becomes close to z distribution

Practice Exercise 7: Using the t-Distribution

When working with small samples or when the population standard deviation is unknown, we use the t-distribution. Let’s take a small sample and calculate a confidence interval using the t-distribution:

# Set seed for reproducibility
set.seed(789)

# Take a small sample of 15 fish from Toolik Lake
small_sample <- toolik_df %>% sample_n(15)

# Calculate sample statistics
small_mean <- mean(small_sample$total_length_mm, na.rm = TRUE)
small_sd <- sd(small_sample$total_length_mm, na.rm = TRUE)
small_n <- 15
small_se <- small_sd / sqrt(small_n)

# Calculate degrees of freedom
df <- small_n - 1

# Find the critical t-value for 95% confidence interval
t_critical <- qt(0.975, df)  # 0.975 for a two-tailed 95% CI

# Calculate the confidence interval
small_ci_lower <- small_mean - t_critical * small_se
small_ci_upper <- small_mean + t_critical * small_se

# Print the results
cat("Small sample (n=15) mean:", round(small_mean, 1), "mm\n")
Small sample (n=15) mean: 57.3 mm
cat("t-critical value (df=14):", round(t_critical, 3), "\n")
t-critical value (df=14): 2.145 
cat("95% CI using t-distribution:", round(small_ci_lower, 1), "to", round(small_ci_upper, 1), "mm\n")
95% CI using t-distribution: 52.7 to 61.9 mm
# For comparison, calculate the CI using the normal distribution (z)
z_critical <- 1.96
z_ci_lower <- small_mean - z_critical * small_se
z_ci_upper <- small_mean + z_critical * small_se

cat("95% CI using normal distribution:", round(z_ci_lower, 1), "to", round(z_ci_upper, 1), "mm\n")
95% CI using normal distribution: 53.1 to 61.5 mm

Which confidence interval is wider? Why is this the case?

Lecture 4: Student’s t-distribution

To calculate CI for sample from “unknown” population:

\(\text{CI} = \bar{y} \pm t \cdot \frac{s}{\sqrt{n}}\)

Where:

  • ȳ is sample mean
  • 𝑛 is sample size
  • s is sample standard deviation
  • t t-value corresponding the probability of the CI
  • t in t-table for different degrees of freedom (n-1)

Lecture 4: Student’s t-distribution

Here is a t-table

  • Values of t that correspond to probabilities
  • Probabilities listed along top
  • Sample dfs are listed in the left-most column
  • Probabilities are given for one-tailed and two-tailed “questions”

Lecture 4: Student’s t-distribution

One-tailed questions: area of distribution left or (right) of a certain value

  • n=20 (df=19) - 90% of the observations found left
  • t= 1.328 (10% are outside)

Lecture 4: Student’s t-distribution

Two-tailed questions refer to area between certain values

  • n= 20 (df=19), 90% of the observations are between
  • t=-1.729 and t=1.729 (10% are outside)

Lecture 4: Student’s t-distribution

Let’s calculate CIs again:

Use two-sided test

  • 95% CI Sample A: = 51.7 ± 1.984 * (12/(208^0.5)) = 1.650788
  • The 95% CI is between 50.05 and 53.35
  • “The 95% CI for the population mean from sample A is 51.7 ± 1.65”

Lecture 4: Student’s t-distribution

So:

  • Can assess confidence that population mean is within a certain range
  • Can use t distribution to ask questions like:
    • “What is probability of getting sample with mean = ȳ from population with mean = µ?” (1 sample t-test)
    • “What is the probability that two samples came from same population?” (2 sample t-test)
Practice Exercise 8: One-Sample t-Test

Let’s perform a one-sample t-test to determine if the mean fish length in Toolik Lake differs from 50 mm:

# Perform a one-sample t-test
t_test_result <- t.test(toolik_df$total_length_mm, mu = 50)

# View the test results
t_test_result

    One Sample t-test

data:  toolik_df$total_length_mm
t = 2.0326, df = 207, p-value = 0.04337
alternative hypothesis: true mean is not equal to 50
95 percent confidence interval:
 50.05097 53.33845
sample estimates:
mean of x 
 51.69471 

Interpret this test result by answering these questions: 1. What was the null hypothesis? 2. What was the alternative hypothesis? 3. What does the p-value tell us? 4. Should we reject or fail to reject the null hypothesis at α = 0.05? 5. What is the practical interpretation of this result for fish biologists?

Lecture 4: Next steps

For example

  • what is probability that population X is the same as our lakes population?

How would you assess this question using what we learned?

Practice Exercise 9: Two-Sample t-Test

Let’s compare fish lengths between two lakes to see if they differ:

# Choose two lakes to compare
lake1 <- "Toolik"
lake2 <- "E 01"  # Change this to another lake if you prefer

# Filter data for the two lakes
lake1_data <- sculpin_df %>% 
  filter(lake == lake1) %>% 
  pull(total_length_mm)

lake2_data <- sculpin_df %>% 
  filter(lake == lake2) %>% 
  pull(total_length_mm)

# Perform a two-sample t-test
lakes_ttest <- t.test(lake1_data, lake2_data)

# View the results
lakes_ttest

    Welch Two Sample t-test

data:  lake1_data and lake2_data
t = -3.4051, df = 116.36, p-value = 0.0009082
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10.313036  -2.727921
sample estimates:
mean of x mean of y 
 51.69471  58.21519 

Now create a boxplot to visualize the difference in fish lengths between these lakes:

# Create a boxplot comparing the two lakes
sculpin_df %>%
  filter(lake %in% c(lake1, lake2)) %>%
  ggplot(aes(x = lake, y = total_length_mm, fill = lake)) +
  geom_boxplot() +
  labs(
    title = paste("Comparison of Fish Lengths in", lake1, "and", lake2),
    x = "Lake",
    y = "Length (mm)"
  ) +
  theme_minimal()
Warning: Removed 268 rows containing non-finite outside the scale range
(`stat_boxplot()`).

Based on the t-test results and the boxplot, what can you conclude about the fish populations in these two lakes?

Lecture 4: Next steps

Let’s calculate the 95% CI for population X

Use two-sided test

95% CI Sample X: = 54 ± 1.984 * (10.9/(132^0.5)) = 1.882267 The 95% CI is between 52.12 and 55.88

Notice: the 95% confidence interval contains 51.7

  • What does this tell us about population X?

Lecture 4: Statistical hypothesis testing

Major goal of statistics:

inferences about populations from samples assign degree of confidence to inferences

Statistical H-testing:

formalized approach to inference

  • hypotheses ask whether samples come from populations with certain properties
  • often interested in questions about population means (but not only)

Lecture 4: Statistical hypothesis testing

Relies on specifying null hypothesis (Ho) and alternate hypothesis (Ha)

  • Ho is the hypothesis of “no effect”
    • (two samples from population with same mean, sample is from population of mean=0)
  • Ha (research hypothesis) the opposite of the Ho

Practice Exercise 10: Formulating Hypotheses

For the following scenarios, write out the null and alternative hypotheses:

  1. Testing if the mean fish length in Lake S 06 is greater than 50 mm.

  2. Testing if there is a difference in mean fish lengths between lakes Toolik and S 06.

  3. Testing if lake E 01 has a higher variance in fish lengths compared to Lake Toolik.

For each scenario, remember that the null hypothesis typically represents “no effect” or “no difference”, while the alternative hypothesis represents what you are trying to demonstrate.

Lecture 4: Statistical hypothesis testing

  • p = 0.3 means that if study repeated 100 times
    • would get this (or more extreme) result due to chance 30 times
  • p = 0.03 means that if study repeated 100 times
    • would get this (or more extreme) result due to chance 3 times

Which p-value suggests Ho likely false?

Lecture 4: Statistical hypothesis testing

At what point reject Ho?

  • p < 0.05 conventional “significance threshold” (α)

  • p < 0.05 means:

    • if Ho is true - if study repeated 100 times
      • would get this (or more extreme) result less than 5 times due to chance

Lecture 4: Statistical hypothesis testing

  • α is the rate at which we will reject a true null hypothesis (Type I error rate)

  • Lowering α will lower likelihood of incorrectly rejecting a true null hypothesis (e.g., 0.01, 0.001)

  • Both hypotheses and α are specified BEFORE collection of data and analysis

Lecture 4: Statistical hypothesis testing

Traditionally α=0.05 is used as a cut off for rejecting null hypothesis

Nothing magical about 0.0 - actual p-values need to be reported.

p-value range Interpretation
P > 0.10 No evidence against Ho - data appear consistent with Ho
0.05 < P < 0.10 Weak evidence against the Ho in favor of Ha
0.01 < P < 0.05 Moderate evidence against Ho in favor of Ha
0.001 < P < 0.01 Strong evidence against Ho in favor of Ha
P < 0.001 Very strong evidence against Ho in favor of Ha

Lecture 4: Statistical hypothesis testing

Fisher:

p-value as informal measure of discrepancy betwen data and Ho

“If p is between 0.1 and 0.9 there is certainly no reason to suspect the hypothesis tested. If it is below 0.02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray if we draw a conventional line at .05 …”

s

Lecture 4: Statistical hypothesis testing

General procedure for H testing:

  • Specify Null (Ho) and alternate (Ha)
  • Determine test (and test statistic) to be used
  • Test statistic is used to compare your data to expectation under Ho (null hypothesis)
  • Specify significance (α or p value) level below which Ho will be rejected

Lecture 4: Statistical hypothesis testing

General procedure for H testing:

  • Collect data - Perform test
  • If p-value < α, conclude Ho is likely false and reject it
  • If p-value > α, conclude no evidence Ho is false and retain it

Final Exercise: Comprehensive Analysis

You’ve learned about standard normal distributions, z-scores, standard error, confidence intervals, and hypothesis testing. Now, put it all together with a comprehensive analysis of fish lengths from multiple lakes.

# Choose 3 lakes for comparison
lakes_to_compare <- c("Toolik", "E 01", "S 06")

# Filter data for these lakes
comparison_data <- sculpin_df %>%
  filter(lake %in% lakes_to_compare) %>%
  filter(!is.na(total_length_mm))

# 1. Calculate summary statistics for each lake
lake_stats <- comparison_data %>%
  group_by(lake) %>%
  summarize(
    mean_length = mean(total_length_mm, na.rm = TRUE),
    sd_length = sd(total_length_mm, na.rm = TRUE),
    n = sum(!is.na(total_length_mm)),
    se_length = sd_length / sqrt(n),
    ci_lower = mean_length - qt(0.975, n-1) * se_length,
    ci_upper = mean_length + qt(0.975, n-1) * se_length
  )

# Display the summary statistics
lake_stats
# A tibble: 3 × 7
  lake   mean_length sd_length     n se_length ci_lower ci_upper
  <chr>        <dbl>     <dbl> <int>     <dbl>    <dbl>    <dbl>
1 E 01          58.2      15.3    79     1.72      54.8     61.6
2 S 06          54.0      10.9   132     0.949     52.1     55.9
3 Toolik        51.7      12.0   208     0.834     50.1     53.3
# 2. Create boxplots to visualize the distributions
comparison_data %>%
  ggplot(aes(x = lake, y = total_length_mm, fill = lake)) +
  geom_boxplot() +
  labs(
    title = "Fish Length Distributions Across Lakes",
    x = "Lake",
    y = "Length (mm)"
  ) +
  theme_minimal()

# 3. Perform t-test comparing each lake to the other - this is NOT what you should do and we will show how later on but its practice

Based on this analysis, write a short summary of your findings: 1. Are there significant differences in fish lengths between the lakes? 2. Which lakes have the largest and smallest fish on average? 3. How do the confidence intervals compare across lakes? 4. What might explain these differences in fish lengths between lakes?

Back to top